Conversation
b03b258 to
f8a6998
Compare
f8a6998 to
77553aa
Compare
|
The proposal looks great! its exactly what i have in mind to support as additional tools via |
|
@bwalderman, @mfreed7, and @bvandersloot, could you take a look? |
declarative-api-explainer.md
Outdated
| creates a new declarative WebMCP tool whose input schema is generated according to | ||
| [Input schema synthesis](#input-schema-synthesis). | ||
|
|
||
| We also introduce the new `toolparamname` and `toolparamdescription` attributes, which apply to form |
There was a problem hiding this comment.
Chromium currently uses toolparamtitle, not toolparamname
declarative-api-explainer.md
Outdated
|
|
||
| ```html | ||
| <form toolname="search-cars" tooldescription="Perform a car make/model search" [...]> | ||
| <input type=text toolparamname="make" toolparamdescription="The vehicle's make (i.e., BMW, Ford)" required> |
There was a problem hiding this comment.
FYI Here's what Chromium currently have:
- The optional
toolparamtitleattribute maps to the json-schema property key. If omitted, the browser defaults to the input element’s name. - The optional
toolparamdescriptionattribute maps to the property description within the json-schema. In its absence, the browser uses the textContent of the associated HTML element but skips descendants that are labelable or, if no label exists, the aria-description.
There was a problem hiding this comment.
FYI Here's what Chromium currently have:
- The optional
toolparamtitleattribute maps to the json-schema property key. If omitted, the browser defaults to the input element’s name.
We don't fall back to the name, we just omit the title field in that case.
I think we probably don't need the toolparamtitle attribute at all. It's not currently part of the proposal, and I suggest keeping it that way. (We will still want to synthesize the title field in some cases, but we can do that based on <label>, etc.)
There was a problem hiding this comment.
It's not currently part of the proposal, and I suggest keeping it that way.
Does this mean Chromium is getting rid of the toolparamtitle attribute? Otherwise, we certainly don't want Chrome to ship it, but the spec exclude it, right?
There was a problem hiding this comment.
@domfarolino Indeed, we wouldn't ship that attribute if it doesn't end up in the spec. The prototype in Chromium supports it right now only because it was part of the original/early proposal.
There was a problem hiding this comment.
Right, but since you're saying we shouldn't include it here, I'm wondering if you're recommending that because Chrome plans to rip it out of our implementation? Or do you think Chrome will keep it, and we should eventually put it in our spec proposal?
There was a problem hiding this comment.
I would like to rip it out, because there is something to be said for keeping this API as "low footprint" as possible. I understand that the author needs an "escape hatch" when <label> / ARIA is still insufficient to lead the agent on the right path (toolparamdescription). However, I don't think I see a need for two such escape hatches? HTML forms should already be reasonably semantically equipped to provide a meaningful title for things via <label> (etc).
<input ... toolparamtitle="foo" toolparamdescription="bar">
would synthesize to two free-text, author-provided fields right next to each other:
{
// ...
"title":"foo",
"description":"bar"
}
If "description":"bar" alone is not enough to guide the agent, then it seems to me like the author might just as well send "description":"Description that covers both foo and bar", instead of having to worry about what passes as a "title" and what passes as a "description".
Chrome plans
This is just my opinion right now, so I'm not sure if that amounts to a "Chrome plan" just yet. We would presumably follow whatever the group decision is. Until then we should be free to suggest a "default" in our proposal, and I'm suggesting that we omit it until someone asks for it. 🙂
01414c2 to
52143d7
Compare
| When a form element performs a navigation, the first `<script type=application/ld+json>` tag on the | ||
| target page is used as the cross-document tool's "response" that gets sent to the model. | ||
|
|
||
| When no such a tag is present, probably we'll decide that the page's entire contents is sent to the |
There was a problem hiding this comment.
Any thoughts on defaulting to a simple "form submitted successfully" message as the tool response? The signal to noise ratio of raw HTML can be huge and I'd argue it's better not to send it at all rather than send all of it.
"Accurate semantic representation" and "useful to the model" aren't the same thing. A content-heavy result page could easily blow out context in a way that's hard to predict or debug.
| 3. Two ways of getting a form response back to the agent that invoked the form tool: | ||
| 1. `SubmitEvent#respondWith()`, which lets JavaScript on the page override the default form | ||
| action, and pipe a response back to the agent without navigating the page. | ||
| 2. Extracting `<script type="application/json-ld">` tags on the page that the form navigated to, |
There was a problem hiding this comment.
I had some time to try out the JSON-LD cross document ergonomics this week and I'm leaning towards declarative tools not concerning themselves with cross-document outputs at all. I'd be curious to hear others' thoughts on this. The cross-document concern feels a bit self-inflicted and I found myself bypassing it in favor of read tools on the navigated-to page that the model can call on follow ups.
The model just needs to know: (a) did I fill the form correctly, if not what are the validation errors? (this does not cause a navigation) and (b) did it submit successfully
(this could just be a generic message). From there it can follow up with read tools on the destination page to verify context.
This would let us drop concerns around output schema for declarative cross document tools as well.
+1 to @bwalderman's <context> idea. That or something like it feels like the right primitive here rather than a cross-document output API bolted onto forms. This sort of thing would be useful in general rather than as a specific solution for form submissions
that cause navigations.
There was a problem hiding this comment.
I also have concerns about <script type="application/json-ld">. The <script> tag isn't exactly intended for this purpose to begin with, and it's being used only to solve for cross-document form submission. That's just one specific case of a more general problem: What happens when an agent needs to initialize or catch-up state, either because it started browsing a new page, or it used a tool (imperative or declarative) that navigated. Read tools are important for the agent to get this initial state after navigation, so I'm leaning towards figuring out the declarative approach for read tools, and then letting this be the solution for cross-document form submission. This is the reason why I brought the <context> tag from the VOIX framework to everyone's attention. We'll want something like it anyway for the declarative approach to be considered complete, and it also happens to be a reasonable solution for cross-document form submission.
It may not end up being called <context>. It would probably be <resource> to keep aligned with MCP, but the semantics would be the same.
There was a problem hiding this comment.
I also found the idea of vacuuming up "circumstantial" JSON-LD post-navigation and declaring it a "result" to a specific tool execution very questionable.
I'm leaning towards figuring out the declarative approach for read tools, and then letting this be the solution for cross-document form submission.
@bwalderman In that world, the result of a declarative tool execution would indeed be something like "form submitted successfully" like @MiguelsPizza says? And then other side-effects of the tool execution would be understood by the agent from "read tools" it would anyway execute after any navigation?
There was a problem hiding this comment.
@andruud that is my interpretation as well. Rather than try to work around loss of state due to navigation, accept this as a limitation and let a simple "form success" message be a cue to the agent that it should pull new state.
| }; | ||
| ``` | ||
|
|
||
| **`toolactivated` and `toolcanceled` events |
There was a problem hiding this comment.
when is the toolcanceled event dispatched?
This is a follow-up of #26 to address #22. Note that #26 has been superseded by this PR, per #26 (comment).